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The Test of English as a Fo^oign Language (TOEFL) was developed 
in 1963 by a National Council on the Testing of English as a Foreign 
Language which was formed through the cooperative effort of over 
thirty organizations, public and private, that were concerned with 
testing the English proficiency of nonnative speakers, of the language 
applying for admission to institutions in the United States. In 1965, 
Ediucational Testing Service (ETS) and the College Board assumed joint 
responsibility for the program and, in 1973 a cooperative arrangement 
for the operation of the program was entered into by ETS, the College 
Board, and the Graduate Record Examinations Board. The membership of 
the College Board is composed of schools, colleges, school systems, 
and educational associations; Graduate Record Examinations Board 
members are associated with graduate education. 

ETS administers the TOEFL program under the general direction of 
a Policy Council that was eatablished by, and is affiliated with, the 
sponsoring organizations. Members of the Policy Council represent 
the College Board and the Graduate Record Examinations Board and such 
institutions and agencies as graduate schools of business, junior and 
community colleges, nonprofit educational exchange agencies, and 
agencies of the United States government. 
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Abstract 

This study examined the performance of two groups of non-native 
English speakers on the Test of English as a Foreign Language (TOEFL) and 
an appropriate verbal aptitude test. One group of graduate applicants took 
both TOEFL and the verbal section of the Aptitude Test of the Graduate 
Record Examinations (GRE) . Another group of undergraduate applicants took 
TOEFL, the verbal section of the College Board Scholastic Aptitude Test 
(SAT), and the Test of Standard Written English (TSWE) . Data are presented 
showing how native and non-native speakers compare on each set of testa. 
Information is also provided to aid in interpreting test results fcr 
non-native speakers who have taken both types of test. The appendix to t:he 
report summarizes item reviews, by specialists in English as a Secona 
Language, wh.ich suggest fuC;ure directions for TOEFL test development.. 
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The Perfonnnnce oE Non-nntive Speakers of English 
on TORFL nnd Verbal Aptitude Tests' 

Previous stud:*es (e.g., Angoff & Sharon, 1970; Clark, 1977) have 
^ahown that the Teat of English as a Foreign Language (TOEFL) does clearly 
distinguish between native and non-native speakers of the language. Native 
speakers perform exceedingly well on TOEFL, finding little difficulty with 
any section of the test. Non-native speaker^s, on the contrary, consistently 
show varying achievement on TOEFL, their scores spanning the entire scale 
used for the test. Thus the studies of TOEFL agree that the test Is useful 
In discriminating English-speaking ability among non-native speakers. 
Clearly, the nature and level of TOEFL preclude direct translations of the 
scores Into scores on verbal aptitude measures commonly used for selection 
of natlve-speaklng studenFs. The question remains, however: What Is the 
relationship between tests which tap these differing aspectis of verbal 
performance? ^ 

The present study examines the performance of non-native speakers of 
English on TOEFL and on some verbal aptitude tests designed for native 
speakers. For graduate-level students, the aptitude measure used was the 
verbal portion of the Graduate Record Examinations Aptitude Test (GRE-V) . 
For undergraduates, two tests were Included: the verbal portion of the 
College Board Scholastic Aptitude Test (SAT-V) and the Test of Standard 
Written English (TSWE). The two major ways of describing relationships ' 
between TOEFL and the verbal aptitude Instruments are, first, to examine 
relative levels of performance of native and non-native speakers on the 
same test, and, second, to Investigate the nature of the relationships 
between performance, on TOEFL and on the verbal aptitude test, by non-native 
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Bpoakers. Both nppronchea are reported in thia atudy. How the two groups 
compare in o£ intoreat, but dlfferencea In level are to be eKpoctod of the 
two populations. The second approach, however, haa important practical 
Implications. As part o£ the review process for admission to United States 
colleRCS and universities, scores on TOEFL and on GRR-V or SAT-V are 
frequently evaluated for foreign applicants who are not native speakers of 
English. For students whose prdficiency in English (as measured by TOEFL) 
approaches that of native speakers, there wpuld seem to be little problem 
in interpreting verbal aptitude scores. But for students who score below 
this level, English-language proficiency may play a significant role in 
their ability to cope with verbal aptitude tests written in English. 
Information that assesses the effect of the English-language factor in 
verbal aptitude tests and that provides some guidance on how. results on one 
type of test can help interpret results on the other would thus be of 
considerable interest to those who must decide on the admission of foreign 
students to U.S. institutions of higher learning. 

This report first describes the procedures used in the study and then 
presents the basic findings on the candidates' test performance, divided 
into undergraduate and graduate categories. With the performance data, 
we present the analysis of test results, including comparative information 
about native vs. non-native performance on the verbal aptitude tests. 
Included in this analysis are the means, standard deviations, and 
reliabilities, as well as the intertest and intratest correlations. 
The last portion of the analysis section documents th rformance of 
non-natives on TOEFL in relation to their performance on the verbal 
aptitude tests. 

O 



The ttppundU to thia report: fluwm«rU«fl a review, by a ponel of 
apeclallata in KngUfiih an i\ Second UnRnnfie, of Itema Prom all t:he tentH 
uaed In the study. The purpose of thin review wnn to elicit expert 
Judgments on the differences nmong the teats. As can ho soon from the 
reaults, the review also stimulated Judgments on the relative difficulty of 
the separate testa and auggestiona for improving t!io appropr iateneaa of 
TORFL is a measure of English-language proficiency. 

PROCEDURES 

Test Selection 

The first step- was to select appropriate measures. The SAT and GRE 
verbal aptitude tests and the Test of Standard Written English seemed 
obvious choices. Nevertheless, it was felt that prior confirmation of 
what tests are currently used by academic institutions to screen foreign 
applicants would be advisable. Thus a telephone survey of admissions 
officers at 50 U.S. colleges and universities was conducted. The 
institutions were selected to provide a representative sample on the basis 
of size, geographical distribution, and category (public, private, etc.). 
Of the 50 institutions surveyed, four offer only bachelor's degrees. All 
the institutions are accredited and have a student population larger than 

one thousands Table 1 summarizes the survey data. For admission of 

C 

undergraduate students, 42 of the schools require foreign applicants to 
take TOEFL. Eight of the 42 that require TOEFL accept the substitution of 
the Michigan English Test or a course in English as a Second Language 
(ESL). Twelve of the 50 institutions require the SAT, but four accept bhe 
substitution of the American College Testing program (ACT). Only two 
require foreign applicants to take College Board Achievement Tests. All ^ 
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thnflcj who trtke 9AT «Iho trtUo TflWH, «l.nu<3 It U mlmln I Htartsil <Uong wUh 



Hno Trthlo I on pnRO 27 

Of the InRtltntlonB thnt offer gradunto-ljpvol ^programH, aix noted 
tlwit MdmtHHlonH aru h/imlled oxcluHtvuly hy tho varlouH academic df^partmonta; 
thuH no slnRle policy applies ta^ r^l. Of the remaining AO achoolti offering 
graduate degrees, thirty-six require TOKFL and onlj^' three allow a aubatitute. 
Twenty-four require foreign applicant^p to take the GRE-V. Only one school 
requires GRE but not TOEFL. Thirteen graduate schools require some applicants 
to take the Graduate Manaeement Admission Test (GMAT) instead of GRE. Four 
schools require the Miller Analogies Te^t, iM one requires the National 
Teacher Examinations (NTE) ^for foreign applicants. Table 2 summarizes the 
principal data from the graduate survey. The results of the survey led us 
to conclude that SAT-V, GRE-V, and TSWE were indeed appropriate instruments 
for the experimental portions of the study and that, although the relationship 
of language proficiency and verbal ' aptitude is of particular importance to 
graduate schools, sufficient numbers of undergraduate institutions used 
both measures .to justify our also including an undergraduate sample. 



See Table 2 on page 27 



Design Considerations 

The analysi>> of student performance on the designated tests was not 
based on data fro-j already existing files. We believed that our purposes 
could best be met by arranging new test administrations, since we would 
avoid the problem of large time lapses between administrations of the two 
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liiCurvuli AUhough thU approMt^l^ lnor*sn«^s ih^ \>rm^iH\m wUh which t\w 
rtJiftMon«hl|i« mny daflorlhi^d, U Hhnuld ha ktipt In mtn«l ihttt mot: Iv^^it lon«l 
fnot:orH might ttlffwr from lluuio In off^uM: In optirat IouhI mImlnUt rrtt: Iohh of 
tha Cqatw. A(lmlH«lonH doe'lHlonH woro nor bulnR wuula on thu hnwlH of th^«ti 
aptUucle HcoroHi and thiia p^r<Milvt»d pr^HHurn to perform whI I m\y havo bi-nni 
lower, hirtharinore, Homn d©Krcs« of nalf-flol^ct ion oparated amooft t\\om 
agreeing to take the aptitude toBta. Thoroforta, the rolat lonflblpH raportad 
here should be taken aa provlalonal, pending converalon findings bflHod on 
operational administration of the teats. 

Sample Selection 

After test centers were Identified where supervisors agreed to give 
the experimental tests (GRE-V, SAT-V, and TSWE) in the afternoon following 
the regular morning TOEFL administration, candidates were asked if they 
would participate in the study. Approximately 600 candidates were 
approached, equally divided between those applying to undergraduate and 
graduate institutions. Of these, 415 students agreed to participate, took 
TOEFL in the morning, and returned to take one of the experimental test i 
in the afternoon. Because of an irregularity in test administration at 
one center, some of the scores could not be used in the experiment. A^ a 
result, a group of 210 undergraduate-level students and a group of 186 
graduate-ievel students were available for study. 

The following data, based on responses to questions as^ed on the day 
of the test, describe the experimental groups: 

Sex . Each group, undergraduate and graduate, is about 65 percent 

male, 35 percent female. 
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A&e* The median age Is 29 for the graduate group aad 21 for the 
undergraduate group. The spread of ages is greater for the graduate- 
level group. The median age category of the graduate group extends 
from 25 to 32 years, whereas the median age category of the under- 
graduate group extends from 19 to 24 years. 

Years of ^English study . The median number of years of English study 
is 7 for the graduate group, 6 for the undergraduate group. 
Months in the U*S » In response to the question "How long have you 
been in the United States?" the graduate-level group reported an 
average of 13 months and the undergraduate group an average of 9 
months. 

Language spoken outside of class . The participants were asked to 
indicate whether they usually spoke English or their native language 
outside of class. About 60 percent of each group marked "Native 
Language," and about -32 percent marked "English." The remainder 
marked both or did not respond to the question. 

Native country * Forty-two different native 'countries were listed by 
the graduate group, and fifty were listed by the undergraduate group. 
The largest number from the same country is 30 (i.e., 16%) for the 
graduates and 45 (i.e., 21%) for the undergraduates. The eight 
largest national groups in each sample. are listed in Table 3. 

See Table 3 on page 28 

Native language . Thirty-five different languages were listed by the 
graduate group and thirty by the undergraduates. The largest group 
speaking a single language was 26 (Farsi, 14%) for the graduates and 

13 
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47 (Fiarsi, 22%) for the undergraduates. Tlie ten largest language 
groups in each sample are listed in Table 4. 

See Table 4 on page 28 

We did not wish to reveal the identity of the GRE, SAT, and TSWE; 
therefore, in all test booklets used and in all correspondence with students 
and supervisors, the tests were referred to as "Experimental Test-Graduate 
Level" and "Experimental Test-Undergraduate Level." 

Test Administration 

All 396 students who agreed to participate in this study, a'l^ thirteen 
test centers throughout the United States, took two tests. All took TOEFL 
in the morning, following the normal procedures for that testing program. 
In the afternoon, the 186 graduate applicants returned to take the graduate- 
level experimental test (GRE-V), and the 210 undergraduate applicants took 
the undergraduate-level experimental test (SAT-V and TSWE). 

RESULTS AND DISCUSSION 
In analyzing the results of the teats taken by the subjects in this 
study, we initially considered the representativeness of the two groups, 
graduate and undergraduate, in relation to the TOEFL population as a whole. 
This could best be checked by comparing 'the performance on TOEFL of the 
two groups of non-native speakers who participated in this study with 
the performance of a representative group of other non-native speakers who 
took the same form of the test on the same date in May 1977. Secondly, to 
address the major questions raised in this study, we compared the performance 
of the two groups on the respective "other" test(s) with the performance* uf 
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native speakers who took the same forms of the tests. To this end, the 
graduate and undergraduate groups were analyzed separately. 

In order for us to compare the performance of the non-native subjects 
across tests, it was useful to look at basic statistical data for the three 
pairs of tests: TOEFL and GRE-Verbal, TOEFL and SAT-Verbal, and TOEFL and 
TSWE. In this portion of the discussion we therefore include means and 
standard deviations as well as correlation coefficients between TOEFL and 
the other tests. The overall distributions across tests are presented as 
scatterplots for the two groups involved; they provide information on how 
scores on one test can be used to interpret scores on the other. 

Representativeness of the Sample 

If the performance of the experimental groups were to indicate that 
they did not represent the typical population of non-native English speakers 
who take TOEFL, any analysis and interpretation of results from this study 
would be of questionable generalizability . In fact, the score distributions 
of both the graduate and undergraduate groups that participated in the 
study were reasonably representative of the general TOEFL population, 
although they were somewhat higher. This conclusion can be derived from a 
direct comparison of the two groups with other non-native speakers who took 
the same form of TOEFL on the same date in May 1977. Of the total number 
of 6,291 such persons participating in that administration at centers 
around the world, a representative sample of 1,540 cases was used to 
compile test data for that form of TOEFL. As shown in Table 5, both the 
graduate and undergraduate experimental groups performed better than the 
statistical sample did. The mean score for the experimental graduate group 
was a full 30 points higher than mean scores for the sample (523 vs. A93) . 
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The undergraduate mean was 9 points higher than the sample's (502 vs. 
493) • Although to a lesser extent, the mean scores achieved by both 
experimental groups for each of the three sections reported by TOEFL 
were also higher than the corresponding mean scores for the sample. 

See Table 5 on page 29 

Further evidence of how the groups compare with other TOEFL candidates 
can be found in the data from all administrations of the test, worldwide, 
conducted from September 1976 to May 1977. For 50,072 graduate students 
in this category, the mean score was 506. For 44,149 undergraduates, the 
mean score was 502. Again we find that our two experimental groups can 
reasonably be considered as representative of TOEFL candidates, the 
undergraduate group being the more representative of the two, as would 
be consistent with their shorter average period of residence in the 
U.S. 

Native vs. Non-native Comparisons 

The next question concerns the performance of the experimental groups 
on tests other than TOEFL. Looking first at the data for the graduate 
group, we would do well to recall that GRE-V was not designed, as TOEFL 
was, to measure English proficiency, nor was it designed with non-native 
speakers in mindi Thus,, it should not be expected that effective comparisons 
of proficiency could readily be made of groups or of individuals who had 
taken GRE-V and TOEFL. Nevertheless, since many non-native English 
speakers do take the GRE Aptitude Test and subsequently must have their 
verbal scores reviewed by admissions offices and academic departments, it 
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ia helpful to see how their performance compares with that of native 
speakers who take the same l:ests* 

In this case, there is no control grr-^.p representing native speakers 
who took the GRE test and TOEFL at the same time. The data used for 
comparison here are taken from an analysis of the performance of a 
representative sample of 1,495 native speakers who took the same form of 
the GRE Verbal Aptitude test at^-th e same time (May 1977) • 

As can be seen in Table 6* the graduate candidates in the non-native 
group, although they were typical of the TOEFL population, scored much 
lower on GRE-Verbal than the native-speaking sample did. 

See Table 6 on page 30 

Scores from GRE-Verbal are reported on a 200-900 scale. Within this 
range, the native speakers had a mean score of 51 A. The non-natives, 
however, had a mean score of only 274. Clearly, scores that cluster 
near the bottom of the scale do not lend^ themselves to easy interpretation, 
particularly in a multiple-choice testing situation, in which blind guessing 
yields an expected score of 200, with a standard deviation of about 70. 
The primary conclusion we can draw from these results is that GRE-V is 
far too difficqlt for most non-native speakers of English. 

No figures are given here for subscores since only total scores 
are reported for GRE-V. This test does, however, contain two different 
types of item. Of the 100 items in GRE-V*, 60 measure verbal reasoning, 

*In October 1977, the GRE Aptitude Test was restructured. The Verbal 
section was reduced from 100 items to 80 items, and it now is timed for 50 
minutes rather than 75. However, scores on the new and old format are 
comparable. 

17 
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including analogies, antonyms, and sentence completions. The remaining 40 
are reading comprehension items requiring the candidate to respond to a 
variety of questions based on prose passages. To obtain precise information 
on how the non-native subjects compared to che native sample in their 
ability to cope with the two separate categories of items, we looked at raw 
scores rather than scaled scores. Here we are referring to the actual 
number of items correc^^y answered on the test. Table 7 indicates the 
means and standard deviations for the native and non-native groups on each 
of the two subparts of GRE-Verbal. The difference between the two groups 
is, for all practical purposes, the same for the verbal reasoning and tffe^ 



reading sections. The data do confirm the earlier evidence, in the form 
of subpart means, that native speakers performed much better than non- 
native. 



One further consideration in comparing the performance of the two 
groups on GRE-V is that of speededness* Once again it was useful to 
look at the two sections for the test separately. In fact, we found a 
greater difference between the groups when we separated the sections. 
In Table 8 we note th'^' the speededness factor appears to have a similar 
effect on both groups in the set of 60 verbal-reasoning items. A much 
clearer difference appears, however, in the set of 40 reading items. 
Even for native speakers of English, GRE-V is speeded in the sense that 
a fairly large number of candidates do not complete the tes*:. But in 



these reading comprehension items, non-native speakers seemed to have even 
greater difficulty in completing the test than did the native speakers. 




\ 
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See Table 7 on page 30 
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See Table 8 on page 30 

A possible factor in the performance difference discussed above is 
the effect of the reading load on non-native speakers. Although the 
non-native speakers do not seem to require any more time than the native 
speakers do to process discrete items like those contained in the verbal 
analogies or antonyms, this does not appear to be the case for the reading 
comprehension items. No definite conclusions can be drawn on this point 
as a result of this study. However, the data shown here do point to a 
variable that could well be significant in all considerations of non-native 
speakers' performance on tests oriented to native speakers. 

At the undergraduate level, similar comparisons were made. Here our 
point of comparison was the group of 232,021 native speakers who took the 
same form of the Scholastic Aptitude Test in December 1974. . Data from a 
representative sample of 1»765 candidates were used to make the native- 
nori-native speaker comparison. Since the SAT-V ang^TSWE were administered 
during the same administration to both the non-native group that participated 
in the study and the native-speaker group, the results of both te&ts are 
included. Table 9 displays the summary test data for the native and 
non-native groups. 

See Table 9 on page 31 

The verbal section of the SAT is reported on a 200-800 scale, the 
GRE-V is on a 200-900 scale, and the results indicate a relationship 
between the native and non-native undergraduates similar to that of 
the corresponding graduate groups. The mean score of 269 achieved by the 

o 
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non-native group on SAT-V reveals once again that this group found the test 
very difficult, since the scores cluster near the low end of the scale. 
The principal difference between the undergraduate and- graduate groups is 
in the mean score for the native speakers. Since the undergraduate native 
speakers achieved a mean score of 425 on SAT-V as opposed to the 514 mean 
score achieved by the graduate native speakers on GRE-V^ the non-nalive 
undergraduates in our study appear to be not so far belov; (1.5 rstandard 
deviations) their native-speaking counterparts as the graduate group was 
(aliioat 2 standard deviations). 

Two observations must again be stressed about these verbal aptitude 
tests. They are measures of ability to do undergraduate or graduate work, 
"ot a language proficiency test in the sense that TOEFL is. Again, 
neither verbal aptitude test is designed for non-native speakers. Both 
considerations must thus be kept in mind when interpreting these results. 

TSWE, whose scale ranges from 20 to 60+, is used for placing entering 
college students in appropriate freshman English classes. It is a language 
test and more closely approximates TOEFL than does either GRE-V or SAT-V. 
Nevertheless, a large discrepancy remains between the native and non-native 
speakers with regard to their performance on TSWE. Quite probably, the 
results should not be interpreted in the same way for both' groups. With 
reference to Table 9, it is important to note that the reliability of TSWE 
is very nearly the same for both groups. This was not true for either the 
SAT-V or GRE-V. That both of the latter tests exhibited low reliability 
for the non-native groups is important to consider when we make our 
overall comparisons of test performance. 
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When wc compiled data on the tests, subscores were not identified for 
the SAT group. Therefore, it is not possible to discuss any differences 
that may exist between performance on discrete vocabulary items vs. reading. 
For the question of test speededness, however, data are available to 
compare the non-native group's speed in coping with the SAT -V and TSWE 
with that of the native-speaking group. Table 10 shows the comparative 
figures for both groups. The two sections of the SAT represent separately 
timed sections, each containing both vocabulary and reading items. The 
first section contains 45 items. The second contains 40 items. What is 
significant about these data is that, by usual measures of speededness, 
the non-native speakers encountered little more difficulty in meeting the 
time requirements of the test than the native speakers did. It is clear 
that the test is speeded for both groups. For the TSWE there is slightly 
more of a difference between the groups, at least in the percentage of 
candidates completing the test (75% of the natives vs. 65% of the non- 
natives). Even though TSWE is more closely related to, TOEFL (at least to 
some of its sections) with regard to test content, completing the 50 items 
in the time allotted apparently introduces more speed demands on non-native 
than on native speakers. 



See Table 10 on page 31 



^ ^st Relationships; Non-natives 

"^o this point, the discussion and analysis of test results have focused 
on how the'^performance of experimental groups of non-native speakers on the 
"other-than-TOEFl/^ tests compared with that of native speakers on the same 
forms of those testsV. The principal conclusion was that these tests are 
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dlfflcult for non-natives and that, because their scores tend to cluster in 

-r 

the low ranges of the scales used by those tests, interpretation of scores 
could be complicated. 

We turn our attention now to the relationship between TOEFL and the 
other tests by looking at the correlations among them. Here we concern 
ourselves only with the two non-native groups of graduates and under- 
graduates who participated in the study. Table 11 gives the data related 
to TOEFL and GRE-V for the graduate group. 

See Table 11 on page 32 

The overall correlation coefficient of .645 between TOEFL and GRE-V 
would seem to indicate that the two tests are to some extent related but 
are by no means identical in the skill being tested. If the part scores • 
are considered, one additional point appears noteworthy. The listening 
comprehension section of TOEFL shows the lowest correlation with the ORE, a 
finding to be expected, since listening comprehension skills are not tapped 
in either of the two parts of GRE-V. The point worth noting is that in 
TOEFL the listening comprehension section shows a similar relationship to 
the other sections. No major difference appears in the relationship of the 
other two sections of TOEFL to GRE-V. 

Looking at the correlation coefficients between TOEFL and the under- 
graduate tests, we find evidence of an increasing relationship. The .681' 
correlation coefficient between TOEFL and SAT-V totals (see Table. 12) is 
slightly higher than that found between TOEFL and. GRE-V. Similarly, 
the .720 correlation coefficient between the TOEFL total and TSWE is 
indicative of a .closer relationship between those two tests than between 



-16- 

TOEFL and either of the two aptitude tests. This follows from TSWE's 
being a test of language ability, particularly of written English. 
Support for this assertion can be found in the fact that the highest 
part-score correlation coefficient between TOEFL and TSWE is that for the 
second section of TOEFL and TSWE (.708). The items used in this section 
of TOEFL most closely resemble those used on TSWE. From a similar point 
of view, it is the third section of TOEFL which shows the greatest relation- 
ship to SAT Verbal. Here the reading and vocabulary items in that section 
of TOEFL resemble the format of items used in SAT-V. As with GRE-V and 
TOEFL, the listening comprehension section of TOEFL shows the lowest 
relationship to either TSWE or SAT-V. 

See Table 12 on page 32 



Overall Test Comparisons 

The principal method of d^escribing the performance of the experimental 

groups has been to present the correlations between the TOEFL scores and 
scores on the other graduate or undergraduate measures. A difficulty, 

,5- 

however, is the radically different performance of the groups on the two 
tests. In. this section we explain the nature of the statistical problem 
and describe c.ertain statistical procedures we adopted to explore this' 
problem and to go beyond these correlation coefficients. At the same 
time, we have felt the need to provide data that can support some broader 
comparative stiatements about how the tests in question are related. Thus, 
by means of the scatterplots presented here, we are able at least to make 
some tentative claims about how TOEFL scores may be used to identify 

^3 

thresholds of relationships to scores on GRE-V and SAT-V. 
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The undergraduate results can be used to exemplify the problem. If 
one examines the reliabilities and Intercorrelatlons for this group, It 
Is apparent that, although TOEFL and TSWE are vlthln a comparable range, 
the difficulty of SAT -V was Inappropriate for measurement In most of the 
non-native speaking group. Although the rellabllty for the TOEFL test was 
an encouraging .94, the total SAT reliability for this group was only .11. 
With raw-score means in the lower one-third of U.S. descriptive statistics, 
the experimental group is near the lower extreme of the SAT scale. The 
standard deviation for the non-native speaking group is less than two-thirds 
that for the English-speaking SAT sample. The Kuder-Rlchardson 20 reliability 
estimate is related to the range of scores in a group, and it is lower in 
more homogeneous samples. Although standard errors of measurement are 
similar across groups, this restriction of variation depresses correlations. 
The correlations obtained may therefore reflect these differences in 
relative difficulty and range of the instruments for the groups, the floor 
effects on SAT-V attenuating possible relationship to TOEFL. In spite of 
the distinct difficulties and the restricted variation of the experimental 
sample, the correlation of .68 between TOEFL total and SAT-V is substantial. 

Figure 1 gives a scatterplot of TOEFL and SAT-V scores, revealing the 
characteristic pattern of two tests of widely dissimilar difficulties. If 
the underlying relationship between, true scores is linear, as in Figure 2, 
but one or both tests are truncated at one end of their range, the resulting 
,observed relationship appears triangular or curvilinear, as in Figure 3. 

Although information is irretrievably lost when really different 
abilities at^ the lower levels of the SAT range are mapped into a small 
range of "chance level" scores, transformations of the scales can partially 
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straighten out the artlfic^lal curvlllnearlty or piecewise linearity 
Induced by^the distinct difficulty levels of the tests. While they do 
not retrieve the value of the correlation that would have been ^obtained 
in the absenc^\| the floor effect, if such transformations increase the 
correlation, it may be concluded that the true relation is higher than tlie 
obtained one. ^ -rJi 

In the undergraduate sample, TSWE was of appropriate difficulty, 
yielding a nearly linear relationship (Figure 4); transformation of its 
scale would not be expected to increase the relationship. However 
transforming scales for the TOEFL-SAT-V relationship might be expiated 
to increase correlations. Two transformations were Investigated: Ic^ 
SAT-V vs. TOEFL, and a correlation based on only those cases 'corresponding 
to median SAT-V scores above the chance level. The log transformation has 
the effect of "squeezing" the upper portion of the SAT scale and of ; 
tending to straighten out nonlinear relationships which exhibit increasing 
slope. The latter approach, "truncation," or trimming, corresponds to a 
piecewise linear fit, discarding those cases in the OP region of Figure 3, 
and fitting only those cases clustering around line PQ. 

As expected, neither ttip log nor the truncation transformations 
increased the TSWE-TOEFL correlations. Table 13 shows^the correlations of 
observed scores, log TSWE vs. TOEFL and TSWE vs. truncated TOEFL. 

a 

See Table 13 on page 36 r 



The log transformation has essentially no effect on correlations with 
TSWE, and truncation actually decreases the correlations. In contrast, as 
shown in Table 14, the transformation does yield very small Increases in 

i 
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correlations between TOEFL I, II, and Total and the-SAT-;-V scores, as does 
truncation for TOEFL I .and TOEFL Total. Neither transformation Increases 
the correlation of TOEFL II— structure and written expresslon^-wlth SAT-V. 



^ ' See Table 14 on page 36 



* / 

These results su'gges.t that the true relationship between SAT-V and 
both the TOEFL llst^yllng and t tie TOEFL, reading and -vocabulary subtests Is 
somewhat Higher than the observed relationship would ;indlcate but that the 
relationship of the TOEFL structure and written expres^ioiL-^ubtest to . 
SAT-V Is not unclerestlmated by. observed score r ^rrelatlons. However, the 
changes In correlations are too small to be - ractlcal significance. 

In the graduate population, neither the d ^e verbal nor the 

reading , comprehension sections of GRE-V yielded, raw score means as high as 
12% of the total number of Items for the non-native English-speaking 
samples. These scores are well below those to be^^^j^ected by chance if 
candidates hkd attempted all items. In this situatioil," we would expect 
scale transformations to lead to increase^' correlatioils between TOEFL and 
the total GRE verbal scores. ^ ■ . 

Table 15 shows that these expectations are confirmed. In this table, 

* 1 
two additional ^trai^rfonnatlons (greIv vs. TOEFL and GRE-V vs. T0EFl3) 

are introduced in an attempt to straighten the marked curvilinearity 
apparent ip Figure 5. These transformations have effects similar to the 
log transformation but by the route of stretching the TOEFL scale (extreme 
stretching for T0EFL3~which raises TOEFL scores to 'the third power) and 
are applicable to curvatures even more pronounced than those which may be 
rectified by the logarithmic transformation. 
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See Table 15 on page 38 



• Here we see a regular increase with the rather extreme stretching of 
the top portion of the TOEFL scale obtained by cubing TOEFL scores yielding 
the j»rentest increase in correlations. 

Examininp, the observed score and T0EFL3 correlations with the two GRE 
vt*rbal subsections, vocabulary and reading, as shown in Table 16, we 
see n similar patterri. There is a greater improvement through scale 
trnnsforinatinn among correlations of the discrete verbal reasoning items 
in HRE r with TOEFL. These are most pronounced for either GRE subscore 
with TOEFL III — reading comprehension and vocabulary. 



See Table 16 on page 38 



It is possible that more extreme t^:ar\s formations would further 
^incr^ase the correlations, but the po'int .that floor effects on the^GRE 
distort the true relationship haslaeen amply documented. 

For TOEFIr reading and vocabulary and the total, truncation does as 

> 

well as cubing the TOEFL score, and examination of the points of truncation 
offers -a rough estimate of the' minimum TOEFL score at which GRE scores^ 
begin- to become interpretable . Table 17 represents the estimated truncation 
points (corresponding to point P in Figure 3), or those TOEFL scores at 
whichGRE scores begin to rise from, their floor and to exhibit positive 
correlations with TOEFL scores. , 

These truncation points might be thought of as minimal values of 
TOEFL scores for which it makes sense to Examine a candidate's GRE verbal 



score. 
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See Table 17 on page 39 

The information in this table may be summarized by suggesting that 
below a TOEFL score of about 4-75, differences in GRE verbal scores are 
unlikely to be interpretable • Following similar procedures, we can suggest 
that SAT verbal scores are unlikely to be informative below TOEFL scores of 
about A35. Here the TOEFL cut score is lower, suggesting that SAT verbal 
scores are likely to be interpretable for a larger proportion (perhaps 80%) 
of the TOEFL undergraduate candidates. Interpretability does not mean 
equivalence, however, and even the TOEFL-TSWE correlations, least distorted 
by floor effects, show that the two tests share only 52% of their variance, 
thus suggesting that the instruments are far from interchangeable. 

GENERAL CONCLUSIONS 
A number of important conclusions can be made as a result of this ^ 
study. It is clear, first of all, that non-native English speakers do not 
perform as well^on the GRE and SAT verbal aptitude tests or on the TSWE as 
» they do on TOEFL. This was to be expected given the nature and purpose of 
these tests. The data provided here, however, do show how non-native 
speakers Qompare with native speakers in performance on three tests other 
ir^^han T0E!PL. With this information, ^interpretations of score reports from 
these tests can be more easily made. 

In this regard, the most useful result of the study was the identi- 
fication of scoire^^vels on TOEFL at which scores on the other tests begin 
to be meaningful. ^ This information would, of course, be useful only for 
students who have taken'both TOEFL and the other test(s). But since most 
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fnreij^n studonts wlio apply to colleges and universities fall In this 
rntepory, the information should be valuable for the admissions process. 

The review of test items by the panel of specialists in English as a 
Second LanRunge does not of course, constitute experimental data (see 
Appendix). Nevertheless, the comments by the reviewers provided important 
supportinr, information for the future refinement of TOEFL. In particular, 
the comments on the length and nature of the reading passages and related 
items used in TORFL and the other tests have yielded valuable information 
for future test construction. 

The most significant result of this study, which relates to both the 
item review and candidate performance, is the manner in which both coincide. 
Regarding overall performance, the GRE verbal test proved to be the most 
difficult for the non-native speaker candidates. The next most difficult 
was the SAT verbal, and third was TSWE. Looking at the comments of the 
reviewers, we find that their order of preference for items is exactly the 
same • 

This study represents the first significant attempt to compare 
performance on TOEFL with that on tests like the verbal aptitude tests 
Included here. All the conclusions reached in this study should prove 
useful for interpreting foreign student performance. Additional studies 
will no doubt raise more specific questions and attempt to reach even 
more practical conclusions than was possible in this study. 
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TABLE 1 

UNDERGRADUATE-LEVEL ADMISSIONS TESTS 
REQUIRED FOR FOREIGN APPLICANTS 



Test Number of Percent 

Institutions 



TOEFL 8A% 
SAT 12^ 2A% 

CB-Achievement 2 A% 



^Eight institutions accept the Michigan English 
Test or an ESL course in place of TOEFL. 

^Four institutions accept ACT in place of SAT, 



TABLE 2 

GRADUATE-LEVEL ADMISSIONS TESTS 
REQUIRED FOR FOREIGN STUDENTS 



Test Number of Percent 

Institutions 



TOEFL 36^ 90% 

GRE-V 24^ 60% 



^Three institutions accept a substitute test or ESL 
Course for TOEFL, 

^Thirteen institutions require GHAT instead of GRE 
for some applicants. 



7 



32 
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TADLE 3 

NATIVE COUNTRIES AND NUMBERS OF PARTICIPANTS 
(Eight largest groups) 



Graduate Group 




Undergraduate Group 




Country 


N 


Country 


N 


India 


30 


Iran 


45 


Iran 


24 


Hong Kong 


24 


Philippines 


17 


Japan 


16 


Korea 


11 


Vietnam 


11 


Vietnam 


10 


Indonesia 


9 


Japan 


10 


China 


8 


China 


10 


Nigeria 


8 


Thailand 


9 


Korea 


7 


Other countries 


65 


Other countries 


82 


Total 


186 


Total 


210 



TABLE A 

NATIVE LANGUAGES AND NUMBERS OF PARTICIPANTS 
(Ten largest groups) 



Graduate Group 




Undergraduate Group 




Language 


N 


Language 


N 


Farsi (Persian) 


26 


Farsl (Persian) 


A7 


Gujarati 


14 


Chinese 


36 


Spanish 


13 


Spanish 


17 


Chinese 


12 


Japanese 


16 


Arabic 


12 


Arabic 


16 


Korean 


11 


Vietnamese 


11 


Japanese 


10 


Indonesian 


9 


Vietnamese 


10 


Korean 


7 


Tagalog 


10 


Greek 


7 


Thai 


9 


Yoruba 


5 


Other languages 


59 


Other languages 


39 


Total 


186 




210 
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TABLE 5 
TOEFL COMPARATIVE HATA 



I. LISTENING 

COMPREHENSION 



EXPERIMENTAL GROW 
GRADUATE (n°.l86) 



COMPARISON GROUP 



EXPERIMENTAL GROUP 
UNDERGRADUATE (n"210) 



MEAN S.D. RELIAB . S.E.M.* MEAN S.D. RELIAB. S.E.M. * MEAN S.D. RELIAB. S.E.M. * 
53.72 7.0J .89 2.9 52.07 7.43 .89 2.9 53.85 6.52 .86 2.4 



II. STRUCTURE 

AND WRITTEN 
EXPRESSION 



50.60 7.99 .83 2.7 47.32 8.71 .86 2.7 48.11 7.63 .81 2.4 



III, READING AND 
VOCABULARY 



52.70 7.85 .91 3.1 48.53 8.56 .92 3.3 48.66 7.40 .89 3.3 



TOTAL 



523 69 .95 



15 



493 75 .95 



17 



502 63 .94 



16 



*Standard Error of Measurement 
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CMIE-VHRIIAI, SCORK COMPAIU.SONS 



EXPERIMENTAL 

GROUP — NON-NATIVES (n='186) 
NATIVE SPEAKERS (n=l,495) 



MEAN 

2 7A 
514 



S.D. RELTAIU 



67 
128 



.78 

.9A 



S.E.M.* 

30 
32 



EXPERIMENTAL 

GROUP — NON-NATIVES (n=186) 
NATIVE SPEAKERS (n=l,495) 



TABLE 7 
ORE SUBPARTS 

VERBAL REASONING 



READING 



MEAN S.D. RELIAB. S.E.M.* MEAN S.D. RELIAB. S.E.M.* 



5.10 7.15 .69 
27.61 11.99 .92 



3.5 3.87 4.60 .47 3.0 
3.5 18.49 8.55 .84 3.4 



TABLE 8 
TEST SPEEDEDNESS 



VERBAL REASONING 
Natives Non-natives 



READING COMP. 
Natives Non-natives 



Per cent completing 
test 

Per cent completing 
75% of test 

Number of items 
reached by 80% 
of candidates 

Total # of items 



84.5 



48.9 



91.4 . 84.4 

53 49 
60 items 



61.3 



94.7 



33 



47.2 
75.6 
27 



40 items 



♦Standard Error of Measurement 
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SAT AND 'I'SWK SCOUK COMI'AK I SONS 

SAT VHUHAL TSWH 

MKAN S.l). RF.LTAH. S . . M . MF,AN S. l). ' RKLTAB. S.H.M. 

KXIM^UtMKNTAl, (iUOUV 

NON-NAT I VK (n-210) 26'J 67 . 77 3,') 2H H.R . 4 

NATIV1-: (n=l,765) '^tZS 106 .91. 32 42.35 11.00 .89 3.7 

Table lo 

TEST SPEEDEDNESS 



Per cent completing 




Per cent completing 
75% of test 

Number of items 
reached by 80% 
of candidates 

Total // of items 



SAT I 
Native Non-native 

72.5 73.5 

99.2 98.5 

A2 41 

45 items 



SAT II 
Native Non-native 

74.5 65.5 

97.4 90.5 

39 38 

' 40 items 



TSWE 

Native Non-native 
75.4 65.0 
96.0 89.5 
47 41 
50 items 
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TAIVI.H I I. 
'I'd i: \n,-V, \{ [i \ NT I'', K( !() Mm I .AT I ON H 
Wm. (n-I.Hfi) 



IJSTKNrNC. 
COMPHKIlliNSION 



SECTION* 

I. LIST. GOMP. 
It. STK & m 

III. RC & VOC 

TOEFL TOTAL 

GRE-V TOTAL 



.69H 
.723 
.878 
.521 



(JllAMMAU AND 
WRITTEN EXPUESSTON 

II 

.698 



.801 
.922 
.612 



KEADINd COMPRKIIENKION 
AND VOCAIUII.AUY ^ 

It t 

.723 
.801 



.924 
.623 



TOTAL 

.878 
.922 

.924 



.645 



SECTION* 
I. LIST. COM?. 
II. STR & WE 
III. RC & VOC 
TOEFL TOTAL 
SAT-VERBAL 
TSWE 



TABLE 12 

TOEFL-SAT-TSWE INTERCORRELAT^IONS 
TOEFL (n=210) 
I II 
.537 



.537 
.633 
.810 
.449 
.512 



.769 
.890 
.643 
.708 



III 
.633 
.769 

.920 
.681 
.657 



TOTAL 
.810 
.890 
.920 

.681 
.720 



♦Section I: Listening Comprehension; Sectio^ II; Structure and Written Expression; 
Section III: Reading Comprehension and Vocabulary. 
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SAT VERBAL SCALBD 



611 


** 


64;? 






619 


•* 


63 I 






60 7 


* 


619 






V)^ 




607 






SH 1 




ti9$ 






1 




Hal 






?i*>9 




b7l 






?>47 




159 






•llH 




U4/ 










5 iti 






*il I 




b21 






499 




,511 






4B7 




499 






47S 




4U7 


1 




461 




475 






4^1 




461 






^19 




45L 






4?7 


** 


4i39 






4lt> 




427 






403 




415 






J9 I 




^0 ) 






379 


*• 


19 I 






367 


** 


179 






19b 


*■ 


367 






34 3 




355 






331 




343 






119 




331 






30 r 




319 






295 




30 7 






261 




295 




1 


271 




283 






259 




271 




1 


247 




259 






235 




2^6 




1 


223 




23A 






211 




222 






2C0 




210 







TOTAL 



10 





2 


3 










1 


1 




1 


1 


1 


2 


2 


1 


2 


2 


4 


5 


2 


4 


2 




2 


4 










3 


3 




5 


3 


2 


1 




1 


3 


6 


10 


6 


4 


20 


,33 


26 


26 




15 



11 



Q 
I 

0 
0 
0 
0 
0 
0 
0 
0 

I 

0 
0 
0 

I 

0 

I 
1 
I 

2 
2 
3 
I 
6 
5 
2 
10 
9 
7 
14 
17 
20 
7 
18 
22 
9 
46 

210 



HIN 



HEAN 



SO N 



SO N-1 



TOEFL: TOTAL SCALED 

SA1 O \L: minscALED 
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210 
210 



350.0000 
20C.0C00 



663.0000 
620.0C,C0 



502.1379 
269.2476 



62.7676 , 
66.6509 



62.9176 
66.8102 



Medians 

Smoothed Medians 



40 



FIGURE 2 



Underlying Relationship Between 
Abilities Measured by Tests A & B 



B 




A 



FIGURE 3 

Observed Relationship When B' is an 
Overly Difficult Measure of the Trait 
Measured by B 




TEST OF ENCLISH AS A FOREIGN LANGUAGE 



FIGURE A 

1 



UNCf HCHACUAr^-L£VEL SAMPLE 



TOEFL: total scaled 

234 254 274 294 314 334 354 374 394 414 434 454 474 494 514 534 554 574 594 614 634 654 
253 273 293 313 333 353 373 393 413 433 453 473 493 513 533 553 573 593 613 633 653 673 



TOT. 



TSWE: J SCALED 

60 
59 
58 
^7 
56 

>4 
53 
52 
51 
50 
49 
48 
47 
46 
45 
44 
43 
42 
41 
40 
39 
38 
37 
36 
35 
34 
33 
32 
31 
30 
29 
28 
27 
26 
25 
24 
23 
22 
21 
20 

TOTAL 
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GKOUP 



10 



"DEFL : 
TSWE: 



TOTAL SCALED 

I SCALED 
0.7202 



210 
210 



HIN 



350.0000 
2C.C000 



FAX 



663.0000 
54.0CC0 



MEAN 



502.1379 
28.0428 







1 


2 
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1 








1 


2 


1 
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1 
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1 


1 


3 


1 
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1 


2 


2 


2^ 
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2 
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1 


2 
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2 


1 
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1 
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1 


3 


2 


1 


5 


8 


9 


9 


6 


13 


20 


33 


26 


26 


SO 


N 


SO 


N-1 





62.7676 
8.8249 



62.9176 
8.8460 



/ 



19 



15 



II 



210 



mooted Medians 



43 
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TABLE 13 

OBSERVED AND TRANSFORMED CORRELATIONS 
TOEFL AND TSWE 



TOEFL SECTION* 



TSWE 

Observed 
Log 

Truncated 



I 

,512 
,512 
,A34 



II 
.708 
.703 
.630 



III 
.657 
.660 
.601 



TOTAL 
.720 
.718 
.707 



TABLE lA 

OBSERVED AND TRANSFORMED CORRELATIONS 
TOEFL AND SAT 



TOEFL SECTION* 



SAT-V 

Observed 

Log 

Truncated 



I 

.AA9 
.A50 
.A52 



II 
.6A3 
.637 
.636 



III 
.681 
.690 
.679 



TOTAL 
.681 
.687 
.687 



*Section I: Listening Comprehension; Section II: Structure and Written Expression; 
Section III: Reading Comprehension and Vocabulary. 
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FIGURE 5 



TEST Of ENGLISH AS A FOREIGN LANGUAGE 



C»AOUAT£-L£VEL SAHPLE 



TOEFL: total SCALED 

23A 25A 27A 29A 3U 334 354 374 394 414 434 454 474 494 514 534 554 5 74 594 614 634 654 
253 273 293 313 333 353 373 393 413 433 453 473 493 513 533 553 573 593 613 633 653 673 



ORE VERBAL: SCALED 
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868 


829 
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848 


609 
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828 


789 


- 


808 


769 


- 


768 


749 
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768 


729 
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748 


709 


- 


728 


689 


- 


708 


669 


- 


688 


649 


- 


668 


629 


- 


648 


609 


- 


626 


589 


- 


608 


569 


- 


588 


549 


- 


568 


529 


- 


*>48 


509 


- 


528 


489 


- 


508 


469 


- 


488 


449 




468 


429 




448 


409 




428 


389 




408 


369 




388 


349 




368 


329 




348 


309 




328 


289 




308 


269 




288 


249 




268 


229 




246 


210 




228 




TOT, 



0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

1 

0 
2 
1 
0 
0 
3 
3 
2 
7 
4 
6 
13 
U 
20 
27 
27 
54 



TOTAL'' 



10 



18 18 



13 



20 



17 16 



28 



10 10 



166 



CROUP 



HIN 



HAX 



MEAN 



SO N 



SD N-1 



TOEFL: rOTAL SCALED 186 337.0000 663.0000 523.3977 69.1972 69.3840 

GRE VERBAL: I>n SCALED 186 2IO.OGOO 560. 0000 273.5806 66.3221 66.5011 

R« 0.6450 



Medians 

Smoothed Medians 
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45 



46 
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TABLE 15 

OBSERVED AND TRANSFORMED CORRELATIONS 
TOEFL AND GRE 



TOEFL SECTION 



GRE-V 
Observed 
Log (GRE-V) 
1 

1-GRE-V 

Truncated 

(TOEFL^) 



I 

.521 
.527 

.530 
.533 
.560 



II 

.612 
.618 

.616 
.627 
.657 



III 

.623 

.648 

.662 
.684 
.686 



TOTAL 

.645 

.662 

.663 
.703 
.703 



GRE I (verbal 

reasoning) 

Observed 

(TOEFL^) 

GRE II (reading) 

Observed 

(TOEFL^) 



TABLE 16 

GRE-VERBAL PART CORRELATIONS WITH TOEFL AND (TOEFL^) 

I .II III TOTAL 



.487 
.529 

.453 
.482 



.582 
.633 

.503 
.530 



.604 

f 

.666 

.499 
.548 



.616 
.674 

.534 
.577 



V 

ERIC 
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TABLE 17 

TRUNCATION POINTS FOR TOEFL AND 
GRE-VERBAL SCORES 



TOEFL SCORE TRUNCATION POINT 
Section I 50 
Section II 44 
Section III 51 

Total 474 



8 
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APPENDIX 
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Appendix 



Item Review 

As Indicated In the earlier section describing the procedures followed 
In this study, an attempt was made to gather Information from a representative 
group of specialists In English as a Second Language on the relationship 
of the various tests administered • Ten specialists representing different 
ESL programs throughout the United States were chosen to review the tests* 
Because one of the ten was not able to complete the assignment, the data 
given here are the result of the reviews of nine ESL specialists. Despite 
the small number. It was felt that the group, chosen because of their 
longstanding familiarity with all aspens of ESL training, would represent 
ESL specialists In general. \ « 

The purpose of this review was to obtain the views of specialists 
on the similarities and differences among the Item types found In the 
various tests'. For this reason the specific tests were not Identified. 
All the It^ms from all four tests (TOEFL, jGRE, SAT, and TSWE) were first 
divided fcy skill or area tested. Within each of the resulting four groups 
(Reading, Vocabulary, Writing, Listening), the Items from the various tests 
were first divided Into groups and then randomized . Along with the Items 
to be reviewed, the specialists were asked to complete a questionnaire in' 
which they were asked to Indicate which items they felt were the most or 
least appropriate "for testing the English proficiency of non-native . 
speakers who are being evaluated for admission to full-time academic (not 
ESL) study in American colleges and universities." 

The responses of the reviewers can best be described for each separate 
section. For the first group (reading), none of the reviewers chose the 
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Items taken from GRE-V as being appropriate for testing thfi reading 
proficiency of the groups described. Three chose the ;.AT reading items as 
the most appropriate, and the remaining six chose r TOEFL reading items. 
At first glance, these results would seem to be contrary to the actual 
performance of the non-native students who participated in the study. In 
the comments included by the reviewers, however, all six who chose the SAT 
items mentioned that those items seemed preferable to the TOEFL items 
because they contained longer, more realistic, reading passages for students 
who are to enter full-time academic study. Such comments are very pertinent 
In the light of the inclusion of the short, practical selections in the 
TOEFL reading section since the introduction of the three-part form of the 
test in 1976. In the last section ^ f. tr?* questionnaire, in which the 
reviewers were asked to indicate their choice of the most appropriate 
' items for testing reading, they showed some ambivalence about cho*osing 
between the TOEFL and SAT items. The reviewers felt that the level of 
difficulty of the TOEFL items -was about what it should be but that the 
main weakness was their not regularly testing comprehension of extended 
passages. 

In the second group of items (vocabulary) , seven of the nine reviewers 
selected the TOEFL vocabulary items as the most appropriate. The principal 
argument given for this choice was the use by TOEFL of sentence-length 
contexts for testing the meanings of words. The verbal analogy type of 
item was felt to be too restrictive and only indirectly related to a 
subject's knowledge of the meanings of words. 

The t^ird group of items was entitled "Writing" and contained only two 
types of items from TOEFL (structure and written expression) and two types 
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from TSWE (not named as such but roughly similar to the TOEFL items). 
The choices indicated by the reviewers did not focus on a clear division 
between -the TOEFL and TSWE items. The preference in the written expression^ 
Item type (those item? requiring recognition of an error in a given sentence) 
was clearly toward the TSWE items. But, again, a particular feature of the 

r 

items explained the choice: that the TSWE uses five-choice items as 
opposed to the four-choice items used by TOEFL. The choices for the 
structure type of item were almost evenly divided between those from TOEFL 
and TSWE. 

The last group included the items for testing listening comprehension. 
In this case, only TOEFL items were included because the other tests do 
not measure listening. The section was nevertheless included for review in 
order to cover all items in the tests used in the study and to get some 
feedback on the differences among the three types used in TOEFL. The 
choices were most in favor of the mini-talks (A), next for the dialogs (3), 
and least for the one-sentence rejoinders (2). On the whole, the comments 
on this section expressed a greater desire for those items that contained 
greater context-.-" Also, the one-sentence items were considered to be less 
realistic than either of the other two for testing listening comprehension. 

In summary, the choices of the group of specialists indicated a 
distinct preference for TOEFL items to test vocabulary, of TOEFL items 
and to some extent SAT items to test reading, and a combination of TOEFL 

and TSWE type items to test writing. Although this information was surely 

• I 

peripheral to the primary purpose of the study, the comments do provide 
valuable guidance on how TOEFL might best measure the English skills 
needed by foreign students entering U.S. colleges and universities. 




This Research Report is part of a series of reports on research relating 
to the Test of English as a .Foreign Language. Other reports Include: 



The Parformance of Native Speakers of English on the Test of English as a 
Foreign Tianguage ; Clark, John L.D. Report 1. November 1977. 

Discusses the results of the administration of TOEFL to native speakers of 
English just prior to their graduation from a college-preparatory high 
school program* Total test score distributions were highly negatively 
skewed, reinforcing findings of earlier studies tliar^^OEFL Is not psycho- 
metrlcally ^appropriate for discriminating among native speakers of English 
with respect to English language competence. 

Ah Evaluabloii of Alternative Item /Formats for Testing English as a Foreign 
Language: Pike, Lewis W. Report 2. June 1979. 

Describes, an extensive research study conducted from 1972 to 197A that was 
designed to explore possible changes In the format and content of TOEFL. 
Questions of valld<atlon, crirterlon selection, and content specifications 
were Investigated. The r^ort ^Includes the results of these findings and 
discusses the Implications, for TOEFL content specifications and Internal 
structure. This study contributed tTo the restructuring of TOEFL beginning 
In 1976. 

An Exploration of Speaking Proficiency Measures In the TOEFL Context ; 
Clark, John L.D. and Swli\ton, Spencer S. Report A. October 1979. 

Describes a three-year study Involving .the ^development and experimental 
admlnlstrauon of test formats £^nd Item types aimed at measuring the 
English-speaking proficiency of noi^-natlve speakers. Factor analysis and 
other techniques were used to Identify subsets of Item formats and Indi- 
vidual items" having satisfactory correlations with the Foreign Service 
'Institute criterion interview administered to the test subjects. The 
results were grouped into a prototype ''Test of Spoken English." 

\ • ■ '■ \ 

The above reports are currently available. Other research reports a^e planned. 
For further Information about any of the TOEFL Research Reports, write to: 

TOEFL Program Office 
Box 899 
' Princeton, NJ 085A1,. USA 



